The intersection of literature and big data

This rendering of West Entrance to the Shakespeare Exhibition Hall is a part of the ongoing renovation of the Folger Shakespeare Library in Washington D.C.

This rendering of West Entrance to the Shakespeare Exhibition Hall is a part of the ongoing renovation of the Folger Shakespeare Library in Washington D.C.

Anupam Basu, Professor of English at Washington University in St. Louis, will give an online keynote lecture for the Folger Shakespeare Library that introduces some of his methods that inform approaches to traditional archival objects and ways to use digital tools to perform historical research. Registration is open to attend "Scaling the Archives: Curating the Corpus of Early English Print" online on Thursday, March 10 at 7:00 p.m. The talk is free and open to the public.

The NC State University Libraries, the College of Humanities and Social Sciences (CHASS), and the English Department co-sponor the lecture series with the Folger to host some of its events while the Folger is undergoing renovations. The seminar series is led and organized by English Department faculty in collaboration with the Libraries and is supported by NC State’s extensive digital technologies infrastructure, which includes the visualization and exhibition spaces at both Hunt and Hill Libraries. Currently the lectures are happening online due to pandemic safety measures.

Basu works at the intersection of literature and big data, drawing on emerging computational techniques like natural language processing and machine learning to make vast digital archives of early modern print more available for computational analysis. Basu is part of the team running the EarlyPrint Lab, which offers a range of online tools for the computational exploration and analysis of English print culture before 1700. The tools are applied to a corpus of more than 60,000 early English printed documents, comprising roughly 1.65 billion words.

Basu’s lecture is of general interest to anyone working in the digital humanities. He will speak about some of the challenges of curating historical corpora—large-scale archives of documents—and argue that making such corpora available for computation requires us to reevaluate editorial conventions, to theorize scale, and to accommodate new models of knowledge and uncertainty.

“Anupam Basu and his collaborators' EarlyPrint Lab is a case study in how digital tools can improve the research of large bodies of digitized printed material,” says Margaret Simon, Associate Professor of English at NC State. “In bringing together text, image, metadata, and collaborative tools, Basu's work innovates on the traditional archive—making scalable, distant reading possible while also creating tools that help us query digitized text at the level of the sentence or word.”